Q-DETR: An Efficient Low-Bit Quantized Detection Transformer

29

decoder.5.co_attn.query

decoder.0.co_attn.query

decoder.2.co_attn.query

(b) 4-bit DETR-R50

(a) Real-valued DETR-R50

FIGURE 2.8

The histogram of query values q (blue shadow) and corresponding PDF curves (red curve)

of Gaussian distribution [136], w.r.t the cross attention of different decoder layers in (a) real-

valued DETR-R50, and (b) 4-bit quantized DETR-R50 (baseline). Gaussian distribution is

generated from the statistical mean and variance of the query values. The query in quantized

DETR-R50 bears information distortion compared with the real-valued one. Experiments

are performed on the VOC dataset [62].

(b) 4-bit DETR-R50

(a) Real-valued DETR-R50

FIGURE 2.9

Spatial attention weight maps in the last decoder of (a) real-valued DETR-R50, and (b)

4-bit quantized DETR-R50. The rectangle denotes the ground-truth bounding box. Follow-

ing [169], the highlighted area denotes the large attention weights in the selected four heads

in compliance with bound prediction. Compared to its real-valued counterpart that focuses

on the ground-truth bounds, quantized DETR-R50 deviates significantly.